NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MaSk-LMM: A Matrix Sketching Framework for Linear Mixed Models in Association Studies

Burch, Myson; Bose, Aritra; Dexter, Gregory; Parida, Laxmi; Drineas, Petros (November 2024, International Conference on Research in Computational Molecular Biology (RECOMB))

Full Text Available
Matrix sketching framework for linear mixed models in association studies

https://doi.org/10.1101/gr.279230.124

Burch, Myson; Bose, Aritra; Dexter, Gregory; Parida, Laxmi; Drineas, Petros (September 2024, Genome Research)

Linear mixed models (LMMs) have been widely used in genome-wide association studies to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relationship matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveragingmatrix sketching, which often results in provably accurate fast and efficient approximations. We leverage matrix sketching to develop a fast and efficient LMM method calledMatrix-Sketching LMM (MaSk-LMM) by sketching the genotype matrix to reduce its dimensions and speed up computations. Our framework comes with both theoretical guarantees and a strong empirical performance compared to the current state-of-the-art for simulated traits and complex diseases.
more » « less
Full Text Available
Structure-informed clustering for population stratification in association studies

https://doi.org/10.1186/s12859-023-05511-w

Bose, Aritra; Burch, Myson; Chowdhury, Agniva; Paschou, Peristera; Drineas, Petros (October 2023, BMC Bioinformatics)

Abstract BackgroundIdentifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants. ResultsTo overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans. ConclusionsCluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
more » « less
A Fast, Provably Accurate Approximation Algorithm for Sparse Principal Component Analysis Reveals Human Genetic Variation Across the World

https://doi.org/10.1007/978-3-031-04749-7_6

Chowdhury, Agniva; Bose, Aritra; Zhou, Samson; Woodruff, David P.; Drineas, Petros (January 2022, Research in Computational Molecular Biology - 26th Annual International Conference)

Full Text Available
Integrating Linguistics, Social Structure, and Geography to Model Genetic Diversity within India

https://doi.org/10.1093/molbev/msaa321

Bose, Aritra; Platt, Daniel E; Parida, Laxmi; Drineas, Petros; Paschou, Peristera (January 2021, Molecular Biology and Evolution)
Heyer, Evelyne (Ed.)
Abstract India represents an intricate tapestry of population substructure shaped by geography, language, culture, and social stratification. Although geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to understand Indian population structure. To date, no study has attempted to model and evaluate how these factors have interacted to shape the patterns of genetic diversity within India. We merged all publicly available data from the Indian subcontinent into a data set of 891 individuals from 90 well-defined groups. Bringing together geography, genetics, and demographic factors, we developed Correlation Optimization of Genetics and Geodemographics to build a model that explains the observed population genetic substructure. We show that shared language along with social structure have been the most powerful forces in creating paths of gene flow in the subcontinent. Furthermore, we discover the ethnic groups that best capture the diverse genetic substructure using a ridge leverage score statistic. Integrating data from India with a data set of additional 1,323 individuals from 50 Eurasian populations, we find that Indo-European and Dravidian speakers of India show shared genetic drift with Europeans, whereas the Tibeto-Burman speaking tribal groups have maximum shared genetic drift with East Asians.
more » « less
Full Text Available
CluStrat: A Structure Informed Clustering Strategy for Population Stratification

https://doi.org/10.1007/978-3-030-45257-5_19

Bose, Aritra; Burch, Myson; Chowdhury, Agniva; Paschou, Peristera; Drineas, Petros (January 2020, Research in Computational Molecular Biology)

Full Text Available
TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes

https://doi.org/10.1093/bioinformatics/btz157

Bose, Aritra; Kalantzis, Vassilis; Kontopoulou, Eugenia-Maria; Elkady, Mai; Paschou, Peristera; Drineas, Petros; Schwartz, Russell (April 2019, Bioinformatics)

Full Text Available

Search for: All records